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Abstract 

Background: Today, recognition and classification of sequence motifs and protein folds is a mature field, thanks to 
the availability of numerous comprehensive and easy to use software packages and web-based services. 
Recognition of structural motifs, by comparison, is less well developed and much less frequently used, possibly due 
to a lack of easily accessible and easy to use software. 

Results: In this paper, we describe an extension of DeepView/Swiss-PdbViewer through which structural motifs may 
be defined and searched for in large protein structure databases, and we show that common structural motifs 
involved in stabilizing protein folds are present in evolutionarily and structurally unrelated proteins, also in deeply 
buried locations which are not obviously related to protein function. 

Conclusions: The possibility to define custom motifs and search for their occurrence in other proteins permits the 
identification of recurrent arrangements of residues that could have structural implications. The possibility to do so 
without having to maintain a complex software/hardware installation on site brings this technology to experts and 
non-experts alike. 



Background 

The three-dimensional structure of proteins has been an 
extensively studied topic for several decades. More than 
fifty years ago, Pauling and Corey described the two 
dominant forms of secondary structure, the a-helix and 
the (3-sheet [1]. Subsequently, a variety of further pat- 
terns and regularities (e.g., [2-4]) in protein structures 
have been found, that have proven useful in the context 
of protein structure determination and quality assess- 
ment of determined structures. During the last twenty 
years, increasingly sophisticated methods for secondary 
structure prediction [5,6], fold recognition and compari- 
son (e.g., FSSP [7], THREADER [8], FOLDFIT [9], and 
others [10-12] have been developed, followed by meth- 
ods for fold classification, such as SCOP [13] and CATH 
[14]. More or less simultaneously, methods were devel- 
oped for identifying and searching for structural similar- 
ities involving limited numbers of amino acid residues (e. 
g, [15-19]), and more recently for the prediction of pro- 
tein function, or functional groups, through recognition 
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of geometrical patterns that involve small numbers of 
residues (e.g., [18,20-25]). 

Quite a few of the previously mentioned patterns 
and regularities in protein structures, and the asso- 
ciated methods for detecting them, in particular, have 
constraints and limitations that make them ill-suited 
to searching for general structural motifs, such as 
only dealing with sequentially compact fragments 
[15], only considering amino-acids that are conserved 
among homologous proteins [16], or restricted to 
small subsets of atom types (i.e., N, Ca and C [17], 
Ca and Cp [18], and Ca and pseudo atom at side- 
chain centre of gravity [19]). What is much more im- 
portant from a practical usability perspective, how- 
ever, is the fact that none of the mentioned methods 
have been well integrated into any comprehensive 
and widely available molecular graphics software 
package. Our purpose in this paper is to present the 
mechanisms and facilities whereby structural motifs 
can be defined and searched for using the freely 
available and well established modelling tool Swiss- 
PdbViewer [26], and to present illustrative examples 
of the types of information that can be obtained by 
doing so. In particular, the facilities for defining and 
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searching for structural motifs now available in 
Swiss-PdbViewer include an interactive visual inter- 
face for defining structural motifs, and a machinery 
that is able to quickly search very large collections of 
structures for such motifs. Finally, to our knowledge, 
the presented structure-search machinery is the only 
one to permit arbitrary combinations of amino-acid- 
type constraints, secondary structure constraints, dis- 
tance constraints, and sequence separation 
constraints. 

The main reason for our interest in general (i.e., se- 
quentially non-contiguous) structural motifs, is the cru- 
cial role played by side-chains in the correct packing of 
proteins. In the context of structural protein modelling, 
slight differences in backbone conformations may ac- 
commodate entirely different combinations of side-chain 
conformations [27], and inadequate sampling of con- 
formational space typically leads to suboptimal confor- 
mations being found, which in turn leads to a 
degradation of model quality as artificially loose protein 
cores are formed in order to leave more space for side- 
chains [28]. Side-chain conformations are not currently 
given sufficient consideration, and new approaches need 
to be pursued in order to make it possible to do so. 

Implementation 

Structural motifs 

The notion of structural motifs has not been clearly 
defined, and finding a definition that is both precise 
and useful is not as simple as it might first appear. 
In essence, there are two main methods of quantify- 
ing structural similarity (of proteins). The first and 
most commonly used measure of similarity between 
two n-atom configurations, A and B, is the so-called 
root mean square deviation (rmsd), the value of 
which is obtained as: 

rmsd(A,B) = J- £ (Ik-^ll) 2 (1) 

where each a h b t G 9l 3 corresponds to the three- 
component coordinate vector of atom i in A and B, 
respectively. A meaningful rmsd-value, however, 
requires that the atom-configurations A and B have 
been optimally superposed in a least-squares sense. 
However, such optimal super-positioning of atom- 
configurations requires an 0(n) computational effort 
for configurations of n atoms in three-dimensional 
space (e.g., see [29,30]). When searching for a struc- 
tural motif of k sequentially non-adjacent residues in 
a protein structure P, comprising m residues in total, 
an rmsd-value may need to be calculated as 
described above for every subset of k residues drawn 
from the m residues of P. The total number of /c- 



residue subsets that can be drawn from m residues is 
given by the expression: 

/ m\ m(m-l)(m-2)...(m-k + 1) ml 

\k) = k\ ~ = k\{m-k)V ^ 

and thus increases at least exponentially with respect 
to k (for k< Lm/2j) and as a /cth-order polynomial 
with respect to m. Although the number of operations 
needed to compute one single rmsd-value grows linearly 
with motif size and would thus not be a limiting factor, 
the overall number of computations necessary to evalu- 
ate the superposition of a /c-residue motif loosely defined 
with respect to sequence constraints onto all possible 
combinations of /c-residues drawn from a structure can 
nonetheless become noticeable in practice. 

Furthermore, the rmsd measure is also in itself problem- 
atic in the present context, because values implying a 
meaningful degree of molecular similarity vary with the 
number and type of amino acid residues or atoms being 
used, and is also quite sensitive to outliers. This problem 
has been addressed by several authors [31-33], but the 
solutions proposed tend to involve empirically determined 
parameters and/or probability distributions that depend 
on the number of atoms involved and the presence or ab- 
sence of chemical bonds between said atoms. In contexts 
where the number and types of amino acid residues in 
motifs as well as the sequential distance between residues 
in motifs will be highly variable, and the collection of pro- 
tein structures participating in the analysis is allowed to 
vary (the pdb database is itself a constantly changing en- 
tity) it appears that making effective use of the mentioned 
methods for judging rmsd- values would be difficult. These 
well known issues with the rmsd measure have also 
prompted the assessors of the CASP community to de- 
velop more robust metrics to judge the quality of models 
[34]. The reasons mentioned above prompted us to choose 
a different approach than rmsd to identify atom configura- 
tions that satisfy a motif specification. 

The second of the two main methods of quantifying 
structural similarity uses matrices, and Z) 5 , consisting of 
all internal distances between atoms in the two collections 
of n atoms A and B. 
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where dtj = 1 1 a i ~ a j I b and dfj = | | b t - bj | | 2 , for i = 1,. . .,n 
and /=1,.. .,n. The matrices and D 3 are typically re- 
ferred to as distance matrices. The distance matrix is a well 
known and frequently used concept in structural characteri- 
zations of proteins [e.g., see [35,36]), and does in fact con- 
tain sufficient information to reconstruct the three- 
dimensional structure of each corresponding protein [37]. 
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Given the two distance matrices and D 8 , a measure of 
the structural similarity between the two configurations of 
atoms A and B expressed as a single number, is obtained by 
forming D = - D 3 and evaluating the expression: 

n n . 

I Z \D 9 \, (4) 

i-\ j=l 

The structural similarity measure so obtained has more 
or less the same shortcomings as rmsd-values when it 
comes to interpreting its meaning. In addition thereto, 
the computations implied by Eqs. (3) and (4) require at 
least n 2 operations, and it is thereby computationally 
more expensive than calculating rmsd values, for large 
values of n. 

Defining a motif through an upper limit on the similar- 
ity measure above or by an rmsd upper limit, cannot be 
said to be intuitively obvious with respect to what struc- 
tures satisfying the constraint will look like. Using said 
similarity measures for motif definitions also makes it 
difficult to strengthen or loosen constraints for specific 
atoms or atom pairs, while keeping the constraints on all 
other atom pairs unchanged. A useful alternative to de- 
fining motifs through an upper limit on a single-valued 
similarity measure is to define motifs through distance 
matrices D u and £> L , containing upper and lower 
bounds, respectively, for some subset S of the elements 
of a distance matrix such as or D 8 . A configuration 
of atoms A is then said to be an instance of the n-atom 
motif with upper and lower distance limit matrices Z) u 
and D L if and only if: 

d l < rfi < d 15 

IJ - I) - I) 

for all (/,;') G S. Defining motifs through upper and lower 
distance bounds as just described is intuitively straight- 
forward and flexible with respect to which distances to 
constrain and what constrains to impose on each such 
distance. For collections of amino acids that are suffi- 
ciently small to reappear in multiple unrelated protein 
structures {i.e., ^ 10 aa), it is feasible in practice to 
search for motifs defined through sets of distance con- 
straints despite the large number of potential combina- 
tions implied by Eqn. (2), in part because the set S of 
constrained distances is typically rather small, and in 
part because candidate configurations may be rejected 
upon detection of the first constraint violation. For the 
reasons mentioned, sets of distance constraints are used 
to specify the geometric aspects of motifs in Swiss- 
PdbViewer. 

As can be seen in Figure IB, motif specifications for 
Swiss-PdbViewer express combinations of amino-acid- 
type constraints, secondary structure constraints, geo- 
metric constraints, and sequence-separation constraints. 
Combinations of such constraints provide considerable 



flexibility and are well suited to the specification of par- 
tially known, small and sequentially non-consecutive 
motifs. The described motif specifications and associated 
search-machinery, are however not intended for, and not 
well suited to searching for large motifs, such as protein 
domains or complete proteins. 

DeepView/Swiss-PdbViewer 

The program Swiss-PdbViewer (a.k.a. DeepView) [26] 
was designed to integrate functions for protein structure 
visualization, analysis and manipulation into a sequence- 
to-structure workbench with a user-friendly interface. It 
allows the user to manage complex modelling projects, 
and Swiss-PdbViewer has been augmented with facilities 
whereby general structural motifs may be defined and 
subsequently searched for in a collection of structures 
(through a web server at the Vital-IT Center for High- 
Performance Computing of the Swiss Institute of 
Bioinformatics). 

As one example of how to use Swiss-PdbViewer for 
motif searches, we use the His/Asp/Ser catalytic triad of 
trypsin from Atlantic salmon (pdb: laOj, 1.7 A reso- 
lution). To search for a structural motif such as that 
represented by His57/Aspl02/Serl95, an appropriate set 
of constraints must be specified. This can be done inter- 
actively from within Swiss-PdbViewer by measuring a 
freely chosen collection of distances (after having opened 
a pdb structure file, distance measurement mode is acti- 
vated by clicking on the icon labelled "1.5 A". Individual 
distances are measured by picking pairs of atoms in the 
structure display window, which is displayed when open- 
ing a pdb-file. Distance measurement mode is exited by 
pressing the keyboard's escape key), as illustrated in 
Figure 1A, and subsequently selecting the item "Gener- 
ate 3D Motif from Current Selection ..." in the "Tools" 
menu. Alternatively, programs external to Swiss- 
PdbViewer can be used to generate motif specifications 
that can subsequently be opened into Swiss-PdbViewer 
and used in 3D motif searches as described below. One 
such program external to Swiss-PdbViewer (the perl-script 
make-spdbv-motif ) is provided in Additional file 1. 
Since both methods described create motif specifica- 
tions from existing structures, it is guaranteed that at 
least one structure satisfying the specification exists. Fi- 
nally, regardless of which of the two different methods for 
defining motifs that is used, motif specifications may at 
present comprise a maximum of 32 groups/residues and 
up to 150 distance constraints, with a maximum of 31 
distance constraints between each pair of residues. The 
mentioned limits on groups/residues and distance con- 
straints in motif specifications do not represent inher- 
ent limitations of the method or its implementation. 
The limits may be increased in future releases of 
Swiss-PdbViewer. 
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#SEARCH3D 

# pattern defined from: 1A0J 

# list of residues 

# GroupNum allowed_kind allowed_Sec_Struct ; name chain num 
GROUP OH * ; ' HIS 1 'A' '57 * 

GROUP ID * 1 ASP 1 'A' '102 ' 

GROUP 2 S * 'SER' 'A' '195 ' 

# distances constraints 

# (FromGrp FromAtom ToGrp ToAtom minDist optimalDist maxDist) 
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1 
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6 
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2 


CB 
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6 


9.6 


17 


DIST 


1 


CA 


2 


CA 


9 


0 


10 


0 


11.0 


18 


DIST 


2 


CA 


0 


CA 


7 


4 


8 


4 
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19 


DIST 


0 


CA 


1 


CA 
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7.5 


20 
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# ( FromGrp 


ToGrp min max) 












22 


DELTA 
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0 


44 


44 










23 


DELTA 


2 


1 


93 


93 










24 


# END 





















Figure 1 Motif definition and specification. A) Structural motif involving residues His57, Asp102 and Serl 95 of 1a0j, and interactively measured 
distances, as rendered by Swiss-PdbViewer (with legibility enhancements). B) Textual representation of the motif specification corresponding to 
the image in Figure 1 (the secondary structure constraints have subsequently been set to "*" using a text editor). 



A sample motif specification, corresponding to the 
His/Asp/Ser structural motif, is shown in Figure IB. As 
can be seen, motif specifications consist of three parts, 
each dealing with particular and distinct aspects of the 
motif, and given by lines of text starting with one of 
three characteristic keywords (GROUP, DIST or DELTA). 
In the first part of a motif specification (lines 5-7 in 
Figure IB) each residue in the motif is uniquely asso- 
ciated with a numeric group label, followed by residue 
type and secondary structure restrictions (one of h, s, c, 
*, hs, he, or sc, with * meaning no restriction) that need 
to be satisfied by corresponding residues in actual 



structures. The alphabetic characters used to specify sec- 
ondary structure restrictions have the following mean- 
ings: h = helix, s = strand, c = coil, and sequences of such 
characters as well as sequences of single character 
residue-type abbreviations are seen as being implicitly 
separated by logical disjunctions. In the second part of a 
motif specification (lines 10-19 in Figure IB) distance 
constraints in Angstrom are given that need to hold be- 
tween specific atoms of the motif residues. The atoms 
involved in distance constraints are identified by a group 
label (as given in the first section) and pdb-format atom 
names, and this is followed by three numeric values, 
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corresponding to the least, measured and greatest dis- 
tance, respectively. When motif specifications are defined 
interactively, as described in the previous paragraph, 
users are prompted to enter a tolerance value (x), and 
the greatest and least value of each distance constraint is 
set to the distance measured ±x A, (or ± x%) respect- 
ively. However, all aspects of a motif specification may 
be further altered using a conventional text editor. In the 
third part of a motif specification (lines 22-23 in 
Figure IB) sequence separation constraints can be given 
for the residue labels given in the first part of the motif 
specification. In each sequence-separation constraint, 
column two and three specify the group labels of the 
groups between which the constraint shall hold, and col- 
umns four and five contain the minimum and maximum 
sequence separation between the groups in question. Se- 
quence separation constraints are present in motif speci- 
fications because it is often desirable to impose 
restrictions of this kind, but doing so is not a require- 
ment. To avoid imposing sequence separation con- 
straints, corresponding upper limits can be set arbitrarily 
high (and lower limits set to zero), or the line of text 
specifying the constraint in question can be left out of 
the motif specification altogether. 

Given a motif specification, individual pdb files as well as 
a collection of pdb files can be searched for constellations 
of atoms and amino acid residues that satisfy the con- 
straints in the motif specification. Both of these alternatives 
are available from within Swiss-PdbViewer, by selecting the 
item "Search 3D Motif in Current Layer. . ." or the item 
"Submit 3D Motif Search Against Subset of PDB. . .", both 
of which are located in the "Tools" menu. The collection of 
PDB structures currently searched when selecting the sec- 
ond item is the set of 13180 90% non-redundant X-ray 
structures first mentioned in the section "Common struc- 
tural motifs in related proteins" above, and in forthcoming 
releases of Swiss -PdbViewer further PDB-subsets to search 
will be provided. Submitting a search against a subset of 
the PDB typically yields a list of hits, for which the con- 
straints of the motif specification used were satisfied. 

Upon completion of a search, one line of text is displayed 
for each combination of residues found to satisfy the con- 
straints of the motif specification used. By clicking on a re- 
sult line corresponding to a search hit, the appropriate pdb 
file is loaded into Swiss-PdbViewer and by performing a 
"Search in Current Layer", the corresponding residues are 
selected. It is then easy to superpose the loaded structures 
and display only the selected residues. The selected resi- 
dues of each structure are superposed by selecting the item 
"Fit molecules (from selection)", located in the "Fit" menu. 
Through a dialog box, the user is then given the choice of 
superposing "Carbon Alpha Only", "Backbone Atoms 
Only", "Sidechain Atoms Only" or "All Atoms", and to se- 
lect the reference structure onto which the others will be 



superposed as well as which other structures that are to be 
superposed onto the reference structure. If not already dis- 
played, an alignment of the amino-acid sequences of loaded 
structures, with selected residues highlighted, is displayed 
by selecting the item "Alignment" located in the "Wind" 
menu. For the purpose of defining and searching for struc- 
tural motifs, Swiss-PdbViewer is thus a flexible tool, with 
which the inspection and evaluation of search-results is 
made easier since sets of residues satisfying structural 
motifs are kept track of and highlighted in various contexts. 
In addition, it is of course also possible to analyze or ma- 
nipulate selected structures and/ or substructures using the 
battery of other tools available for this purpose in Swiss- 
PdbViewer. 

Searching for 3-6-residue motifs in a database of 
13180 structures (vide infra) takes 80-100 seconds of 
wall-clock time on a single 2.8 GHz Intel Xeon type pro- 
cessor. The by far most costly part of searches is the 
reading of files containing molecular structures, and the 
variations in measured execution times appear not to be 
correlated with motif size, but instead most likely caused 
by variations in I/O throughput. 

Results 

Common structural motifs in related proteins 

As a first example of a structural motif, we consider the 
well-known His/Asp/Ser catalytic triad of trypsin 
(Figure 1A). Using the coordinates of laOj, a motif speci- 
fication such as that shown in Figure IB, was created 
interactively using Swiss-PdbViewer. A search for the 
generated motif specification was performed across a 
collection of pdb structures (13180 90% non-redundant 
X-ray structures having a resolution of 3.0 A or better 
obtained by using the PISCES sequence culling server 
[38,39]). A total of 33 sets of atom coordinates satisfying 
the motif-specification were found, located in 25 
uniquely named structures. This corresponds to all struc- 
tures (and all catalytic triads) present in our database 
that are related to laOj, in the sense of having a blast 
[40] expectation value (pre-calculated E-value tables 
were downloaded from rcsb.org) less than 10.0 (the max- 
imum blast E-value observed among these was 1.08-10" 
25 ). When the search is repeated with all distance con- 
straint error tolerances increased from 1.0 A to 2.0 A, 
the same 33 sets of atom coordinates from 25 uniquely 
named pdb structures were obtained, indicating that the 
identified geometric configuration is indeed present as a 
distinct motif in the corresponding structures. 

Common structural motifs in unrelated proteins 

As a second example of a structural motif, we consider 
the well-known, so-called DxDxDG calcium-binding 
motif of calmodulin [19] that are present in the Ca 2 
+ -binding helix-loop-helix structures of many calcium- 
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binding proteins. Calmodulin has four calcium-binding 
motifs (at Asp20, Asp22, Asp24, Gly25, at Asp56, Asp58, 
Asn60, Gly61, at Asp93, Asp95, Asn97, Gly98, and at 
Aspl29, Aspl31, Aspl33, Glyl34, respectively, in pdb id 
lexr), the pairwise backbone rmsd-values between which 
are all less than 0.5 A. The second and the third motifs are 
called DxDxDG-like motifs since they have an asparagine 
in the third position, serine is also common in this pos- 
ition [19]. Using the coordinates of Paramecium tetraure- 
lia calmodulin (pdb: lexr, 1.0 A resolution), a motif 
specification (Additional file 2) was created (using the 
script make -s pdb v-motif, in Additional file 1) for the 
first calcium binding motif given above. Searching for this 
motif in our non-redundant database returned 134 struc- 
tural hits located in 54 distinct structures. Of these 54 
structures, the primary sequences of 43 have blast E-values 
of less than 10.0 when compared to the primary sequence 
of lexr (the most distant was human frequenin with an E- 
value of 1.2), and all the DxDxDG EF-hands were retrieved, 
except two of the four DxDxDG EF-hands present in 2ix7. 
Quite notably, 2ix7 is an apo-calmodulin bound to the first 
two IQ motifs of myosin V [41] and the two EF-hands that 
do not satisfy the distance constraints defined in the motif 
specification adopt conformations that are clearly distinct 
from those from which the motif specification was gener- 
ated. The pdb ids of the remaining 11 structures, consid- 
ered to be unrelated to lexr (E-value > 10.0), are lace, 
lklx, ltxv, lux6, lwza, lyo8, 2 h61, 2hpk, 2scp, 2z30 and 
2z8r (lace [anthrax protective antigen], lklx [4-alpha-glu- 
canotransferase, C-terminal domain], ltxv [integrin alpha 
N- terminal domain], lux6 [thrombospondin-1], lwza [bac- 
terial alpha-amylase], lyo8 [thrombospondin C-terminal 
domain], 2 h61 [calcyclin (S100)], 2hpk [photoprotein ber- 
ovin], 2scp [sarcoplasmic calcium- binding protein], 2z30 
[tk-subtilisin], 2z8r [YesW protein]). Of these structures, all 
have a Ca 2+ ion in the proximity of the three asparagines 
except 2hpk, in which the Ca 2+ ion is substituted by a Mg 2 
+ ion. Furthermore, it is only in 2hpk, 2 h61 and 2scp that 
the DxDxDG motifs are situated in the characteristic helix- 
loop-helix setting, but as has been previously noted the 
DxDxDG structural motif does occur in a variety of struc- 
tural contexts [19]. Figures 2A & B show a subset of the 
mentioned structures aligned with respect to their com- 
mon DxDxDG motif, and as can be seen, the structures are 
clearly different aside from their common structural motif. 

Our third example of structural motifs, concerns pig 
insulin. Functional residues are evolutionarily conserved 
in proteins, and we assumed that residues that contrib- 
ute to the folding and/or stability of a protein region are 
also good candidates for conservation. As a part of inves- 
tigations with a different purpose, residues of importance 
for the stability of the pig insulin fold have been identi- 
fied using CMEPS [42] and FOLDEF [43]. According to 
these results, LeuBll and LeuB15 (pdb id: 4ins, chain B) 



are among the most important for fold stability, with the 
structural neighbors TyrB26 and ValB12 being identified 
as potential contributor and non-contributor to fold sta- 
bility, respectively. The importance of ValB12 is on the 
other hand suggested by experimental results [44]. 

A motif specification was designed based on LeuBll, 
ValB12, LeuB15 and TyrB26 in pig insulin (pdb: 4ins, 
1.5 A resolution), with the strict 11-residue sequence 
separation constraint between the 3 rd and 4 th residue of 
the motif loosed by permitting deviations of ±25 
(Additional file 3, line 22). As the result of this search, 
the very same spatial constellation of Leu, Val, Leu and 
Tyr was found to be present in the His6 enzyme from 
Saccharomyces cerevisiae ([45], pdb: 2agk). 

As is clearly seen in Figure 3, the geometries of 4ins 
and 2agk are very different, with the exception of the 
four residues forming the motif that was searched for. 
Although this is per se not a proof that these residues 
contribute to the stability of the yeast protein, the men- 
tioned residues are strictly conserved in homologues of 
the yeast protein, despite a sequence identity of only 
about 55% for these homologues. When insulin and 
insulin-like proteins found in UniProtKB (E < 0.1) are 
aligned (and assuming Tyr and Phe to be interchange- 
able at the fourth position), there is a strict sequence 
conservation of the motif under discussion, except for 
three sequences (Additional file 4). Two of these 
sequences originate from Microcavia niata and Galea 
musteloides, belonging to the caviomorph group of 
rodents known to have a deficient insulin that exhibits 
only 1 - 10% of biological activity in comparison to 
other mammals [46]. The third sequence comes from 
the whole genome shotgun of Tetraodon nigroviridis, 
and it would be interesting to have the sequence con- 
firmed and stability checked for this insulin. 

When the CMEPS- and FOLDEF-calculations men- 
tioned above are repeated for pdb id 2agk, the residues 
Leul93, Leul97 and Tyr211 (corresponding to LeuBll, 
LeuB15 and TyrB26, respectively, in 4ins) were identified 
as the most important for fold stability and Val 194 was 
identified as a potential contributor to fold stability 
(Additional file 5). 

Our fourth (and final) example of structural motifs 
concerns rabbit L-gulonate 3-dehydrogenase (pdb: 2dpo, 
1.70 A resolution). Residues Val9, Hell, Ala22, Val32 and 
Leu34 in 2dpo are part of the constellation of two (3- 
sheets and an a-helix shown in Figure 4 A & B (centre 
images). Using the coordinates of 2dpo, a motif specifica- 
tion comprising the above mentioned residues was cre- 
ated (Additional file 6). The defined motif was found to 
be present (among others) also in pdb ids lo94 (tri- 
methylamine dehydrogenase) and 2obk (putative Se 
binding protein), both of which are unrelated to 2dpo 
(E-value > 10.0). Figures 4 A & B show the mentioned 
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Figure 2 DxDxDG structural motifs in a variety of structural contexts. A) Structures of two unrelated hits with pdb ids 1txv (chain A, left), 
2z30 (chain A, right) and 1exr (centre), aligned with respect to their common structural motif. B) Expansions of the corresponding structures in A, 
to clearly show the common structural motif. The backbone rmsd:s of the presented structural motifs to the original configuration in 1exr are 0.33 
and 0.26 A for 1txv and 2z30, respectively. In both A) and B), the residues of each protein that satisfied the motif specification have been 
coloured red, and the stretches of residues preceding and following the motif have been coloured in yellow and blue, respectively. 



structures aligned with respect to their common motif, 
and as can be seen, the structures are once again clearly 
different (ignoring their common structural motif). 
According to FoldX (Version 2.5.1) [43,47] results, all 
residues belonging to the above mentioned motifs are 
expected to be important for the stability of the protein 
folds, with folding free energy changes upon alanine mu- 
tation ranging from 1.8 to 4.3 kcal/mol (Additional file 7, 
Additional file 8 and Additional file 9, for completeness 
and comparison, FoldX results for 4ins and 2agk are pro- 
vided in Additional file 10 and Additional file 11). 



Discussion 

The purpose of the examples given above is to illustrate 
some of the possibilities available when searching for 
structural motifs using Swiss-PdbViewer. Neither ex- 
ample is intended to be a comprehensive treatment of 
the corresponding topic (e.g., searching for calcium- 
binding motifs). Furthermore, searching for structural 
motifs should not in itself be expected to be competitive 
with methods developed for and dedicated to identifying 
specific properties of proteins, such as being calcium- 
binding [48,49], zinc-binding [50,51], exhibiting catalytic 
activity [52], etc. In particular, methods dedicated to 
identifying specific properties of proteins are often based 
on machine-learning techniques and make extensive use 



of sets of parameters chosen and parameter-values tuned 
for the specific problem being addressed {e.g., [50,51]). 

The examples given above also show that searches for 
structural motifs can be set up in different ways. Choices 
of atom-type pairs between which to impose distance 
constraints, distance constraint tolerance limits, etc., can 




Figure 3 A deeply buried 4-residue structural motif. Structures 
of 4ins chain B (yellow) and 2agk (blue), oriented such that the four 
residues forming the common structural motif (in red) are optimally 
superposed (the backbone rmsd for the common motif is 0.36 A). 
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Figure 4 A deeply buried 5-residue structural motif. A) Structures of two unrelated hits with pdb ids 1o94 (chain A, left), 2obk (chain A, right) 
and 2dpo (centre), aligned with respect to their common structural motif. B) Expansions of the corresponding structures in A, to clearly show the 
common structural motif. The backbone rmsd:s of the presented structural motifs to the original configuration in 2dpo are 0.38 (1o94) and 0.38 A 
(2obk), respectively. The structures have been coloured as described in the text for Figure 2. 



obviously depend both on what is being searched for and 
the quality of the structures that are searched. Since 
motif specifications are text files, they can easily be edi- 
ted (using conventional text editors) to fit particular user 
requirements prior to starting/submitting the search. For 
example, the strict requirement of having an aspartate 
for the third residue of the motif presented in Additional 
file 1 could be relaxed to also tolerating an asparagine 
simply by changing the D to DN. Likewise, individual 
distance constraints can be tightened or relaxed at will. 

As is clearly demonstrated by the examples we have 
presented, common structural motifs are indeed present 
and possible to find in evolutionarily and structurally un- 
related protein structures in the Protein Data Bank [53]. 
For the observed motifs, backbone rmsd-values are less 
than 0.5 A, which is less than that typically observed 
across the ensemble for atoms in protein structures 
determined by NMR spectroscopy [54,55]. Thus, consid- 
ering the similar geometric configurations of amino acid 
residues that we have observed in different structures to 
be instances of common motifs, is well justified. 

Previous studies of sequentially non-contiguous struc- 
tural motifs have been almost exclusively concerned with 
functional groups on the surfaces of proteins. By con- 
trast, we have also observed structural motifs that exist 



deeply buried in the interiors of structures (third and 
fourth example above). 

Considering the relative ease with which the given 
examples were found, we expect such motifs to be a fre- 
quently occurring phenomenon. A large number of un- 
answered questions remain, however. For example, how 
many such motifs are present on average in each protein 
structure? In how many distinct structures is a specific 
motif typically present?, etc. Due to the crucial role of 
side-chain packing in native protein structures, we sus- 
pect that structural motifs may become useful for pro- 
tein structure prediction and refinement. 

Conclusions 

Investigations to address the questions posed above, as 
well as to evaluate the usefulness of structural motifs for 
structure prediction and refinement are currently under- 
way. Irrespectively, however, it is already clear that the 
mechanisms to search for structural motifs integrated 
into DeepView/Swiss-PdbViewer is a useful and valuable 
tool. The processing time to search for structural motifs 
of potentially interesting kinds is sufficiently small that it 
can be used as a standard technique whenever the kinds 
of information illustrated by our examples would be use- 
ful. Furthermore, thanks to being integrated into 
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DeepView/Swiss-PdbViewer, structural motifs can not 
only be defined by running external programs, but can 
also be interactively defined with direct visual feedback, 
from within DeepView/Swiss-PdbViewer. Finally, struc- 
ture searches, irrespectively of how motifs have been 
defined, are submitted from within Deep View/ Swiss- 
PdbViewer, so that anyone can benefit from this search- 
ing capability without having to maintain a complex 
hardware/software installation. 

Availability and requirements 

Project name: Swiss-PdbViewer 

Project home page: http://spdbv.vital-it.ch; tutorial: 
http://spdbv.vital-it.ch/motif3Dsearch_tut.html 
Operating system(s): Microsoft Windows and Mac 
OSX 

Programming language: ANSI C 
Other requirements: None 

License: freely available in binary/executable form. 
Any restrictions to use by non-academics: No 

Additional files 



scanning of 2dpo using FoldX (see main text for citations) follow 
immediately below. Bold letters and digits are used for residues and 
values belonging to the motifs discussed in the text. Energies are in kcal/ 
mol. 

Additional file 10: The (raw) results of computational alanine 
scanning of 1o94 using FoldX (see main text for citations) follow 
immediately below. Bold letters and digits are used for residues and 
values belonging to the motifs discussed in the text. Energies are in kcal/ 
mol. 

Additional file 11: The (raw) results of computational alanine 
scanning of 2obk using FoldX (see main text for citations) follow 
immediately below. Bold letters and digits are used for residues and 
values belonging to the motifs discussed in the text. Energies are in kcal/ 
mol. 
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